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Abstract 

We analyze a new general representation for the Minimum Weight Steiner Tree (MST) problem which 
translates the topological connectivity constraint into a set of local conditions which can be analyzed by 
the so called cavity equations techniques. For the limit case of the Spanning tree we prove that the fixed 
point of the algorithm arising from the cavity equations leads to the global optimum. 

1 Microsoft Research, One Microsoft Way. 98052 Redmond, WA 

2 Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy 



1 Introduction 

Given a graph with positive weights on the edges, the MST problem consists in finding a tree of minimum 
weight that contains a given set of "terminal" vertices. Such construction may require the inclusion of some 
nonterminal nodes which are called Steiner nodes. Beside its practical importance in many fields, MST 
is a basic optimization problem over networks which lies at the root of computer science, being both NP- 
complete [T] and difficult to approximate ,2 . In statistical physics the Steiner tree problem has similarities 
with basic models such as polymers and self avoiding walks with a non-trivial interplay between local an 
global constraints, e.g. energy minimization versus global connectivity. In recent years many algorithmic 
results have appeared showing the efficacy of the cavity approach for optimization and inference problems 
defined over both sparse and dense random networks of constraints [HI [4j [5], El [TJ [8] . These performances 
are understood in terms of factorization properties of the Gibbs measure over ground states, which can be 
also seen as the onset of correlation decay along the iterations of the cavity equations [9]. Here we make a 
step further in this direction by presenting evidence for the exactness of the cavity approach for problems 
having an additional rigid global constraint which couples all variables. We show that the cavity approach 
can be used to derive a new algorithm [10] for MST which has exact fixed points in the limit case of the 
Spanning Tree. More specifically, we show how the analysis of the computational tree which characterizes 
the evolution of the so called cavity marginals can be used to prove optimality. 

2 Definitions and Problem Statement 



Consider an undirected simple graph G = (V, E), with vertices V = {1, . . . , n}, and edges E. Let each edge 
{i, j} have weight Wij £ K. Denote the set of neighbors of each vertex i in G by N(i). Let U be a subset of 
vertices called terminals. A connected subgraph T of G is called Steiner tree if it has no cycle and contains 
all vertices of U . For the special case of U = V, the tree T is called a spanning tree. The set of all Steiner 
trees of the graph G with terminals U is denoted by St(G, U). 

The weight of the Steiner tree T, denoted by Wt, is denned by Wt = Ylij w yl{i.j}6T- The minimum 
weight Steiner tree (MST), T*(U), is defined by T*(U) = argmin TeSt ( G (7 ) Wt, and for spanning trees 
(when U = V) we drop the reference to U and denote it by T*. The goal of this paper is to present a belief 



1 



propagation (BP) based algorithm for finding T*(U) and analyze it. Throughout the paper, we will assume 
that T*(U) is unique. If the optimum, T*(U), is not unique then the degeneracy can be lifted by a small 
random perturbation of the weights which does not change the optimum tree. 



3 Algorithm and Main Result 

In this section we explain the BP algorithm for finding the minimum weight Stcincr tree. Let us quickly 
explain the model. This is done in more details in [10] . 

3.1 The pointer-depth model 

We model the Steiner tree problem as a rooted tree (such a construction is often associated with the term 
arborescence). Name the vertex 1 G V the root. Then each node i is endowed with a pair of variables (pi, di), 
a pointer pi to some other node in the neighborhood N(i) of i and a depth di £ {1, . . . , d max } defined as the 
distance from the root. Terminal nodes (vertices in U) must point to some other node in the final tree and 
hence pi € N(i). The root node conventionally points to itself . Non-root nodes either point to some other 
node in N(i) if they are part of the tree (Steiner and terminal nodes) or just do not point to any node if 
they are not part of the tree (allowed only for non-terminals), a fact that we represent by allowing a "null" 
state for the pointer pi. i.e. pi £ N(i) U {null}. The depth of the root is set to zero, d\ — while for the 
other nodes in the tree the depths measure the distance from the root along the unique simple path from 
the node to the root. 

In order to impose the global connectivity constraint for the tree we need to impose the condition that if 
Pi = j then dj = di — 1. This condition forbids cycles and guarantees that the pointers describe a tree. In 
building the BP equations, we need to introduce the characteristic functions f t j = fij(pi,di,pj,dj) which 
impose such constraints over configurations of the decision variables (pi,di). For any edge we have the 
indicator function f v] = g^g^ where g jk (pj,dj,pk,dk) = (l - S PkJ (l - <5 dj (l - 5 Pk j8 Vjt $) . Therefore 

any set of the decision variables {pi, di} i that satisfies the condition Y[(i j) e ^ fijiPh diiPji dj) = 1 corresponds 
to a Steiner tree in St(G, U). 

3.2 BP Equations and the Algorithm 

Let us define uii nu n = oo for any i ^ U . Then the max-sum BP equations will be the followings: 

ipj^i (dj, Pj ) = - w m + <t>k^j {dj,Pj) (1) 

kej\i 

<fi k ^j (dj,pj) = max ip k ^j(d k ,p k ) (2) 

d k -,Pk-fjk (dk ,Pk-dj,pj)^0 

On a tree ipj^i(dj,Pj) can be interpreted as the minimum cost change of removing a vertex j with forced 
configuration dj,pj from the subgraph with link (i,j) already removed. 

On a fixed point, one computes marginals ipj: 

ipj (dj ,pj) = -w jPj + ^ <f> k ^j (dj , pj ) (3) 
kej 

and the BP guess of the optimum tree is given by arg max ipj . 

For efficient implementation of the equations (H])-© we introduce the variables Af.^ = max Pfe ^j inu n ipk^j (d, Pk 
B*^ = <P k ^j (d, null) , Cjg_^ = ^ (d, j) , D k ^j = max d max [A d k ^ , B^ } and . = max { C d k t) , D k ^ 
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This is enough to compute <f>k^>j (dj,Pj) — Aff_^_j , D k ^j, ^k-^j f° r = k, Pj = null and pj =/= k, null respec- 
tively. Eqs. [l][2]can then be solved by repeated iteration of the following set of equations: 

4^ {t)-w 3k } (4) 

(5) 
(0) 

(7) 

(8) 





+ 1) 




Bj^i (t 


+ 1) 


= -Wj'null + E (*) 

kej\i 




+ 1) 


= -v>a + E (*) 


Dj^i (t) 


= max ^maxi^, (t) , Bj_>t (i)^ 




A (*) 


= max(C^(i),^W) 



Messages arc initialized arbitrarily (e.g. all set to at time t = 0). Equations fj][5] are iterated for t = 0, 1, . . . 
until M(t) converges. At each iteration t the estimated MST is computed as T(t) — U" =2 {(j,Pj(t))} where 
we define pj(t) = argmax Pj {max dj . ipj (t,dj,pj)} and ipj (t,dj,Pj) = J2kej\ P] E k^j (*) + A t^j ~ w ]k . Note 
that before convergence, T(t) is not necessarily a tree. 

One can also look at an equivalent formulation of the problem that can be constructed by introducing a link 
representation of the pointer variables (introduce link variables Xij = 0, ±1, if i does not point j, 1 if i 
points j and —1 is j points i). This is a natural representation for more general versions of the Stciner tree 
problem but in this paper we use the pointer-depth model. 



3.3 Main result for spanning trees 

Although iterations of equations provides a distributed algorithm for solving Steiner trees, our analysis 
is currently for the case of spanning trees. Therefore throughout the rest of the paper we will only focus on 
the case ofU = V. First let us define a notion of convergence for the algorithm. 

Definition Given a set of initial conditions {A^j(0), Bi^j(0), Ci_»j(0), £)j_,j(0), ^(O)}^ ■, wc say that 
the BP algorithm converges to {(pi,di)} i , if the decision variables converge to {(pi, di)} i (i.e. there exist an 
integer N > 0, such that for all t > N and all i : pt(t) = pi, di(t) = di). 

Theorem 1 If the BP algorithm converges to {(pi, di)},,, then the set of the edges {(i,Pi)}i is the minimum 
spanning tree T* . 



Note 1. For Theorem [T] to hold we only need the equalities pi{t) = pi,di(t) = di to hold for N < t < 
A + 2d max + l. 



Note 2. There are examples for which this BP algorithm does not converge and one needs to use some 
heuristics to make it converge [1] . To the best of our knowledge there is no rigorous analysis of these heuristics 
in the literature. 



4 Analysis 

Before proving the Theorem [T] we quickly review the notion of computation tree. Computation trees have 
been used in most of the previous analysis of the BP algorithms; see jTTJ [T2] for a list those works. 
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(i) (ii) (iii) 



Figure 1: (i) shows a graph with 4 vertices, (ii) represents the computation tree T% for it, and (iii) shows an 
oriented spanning tree on the computation tree. 

4.1 Computation Tree. 

For any i G V, let Tf be the t-leve\ computation tree corresponding to i, defined as follows: Tf is a weighted 
tree of height t + 1, rooted at i. All tree-nodes have labels from the set {1, . . . , n} according to the following 
recursive rules: 

(a) The root has label i. 

(b) The set of labels of the dego(i) children of the root is equal to N(i). 

(c) If s is a non-leaf node whose parent has label r, then the set of labels of its children is N(s)\{r}. 

Notation. We denote a vertex u of the computation tree by [u,i] if it has label i. We also denote root of 
the computation tree T\ by [root,i]. 

Similar to the pointer-depth model in graph G, we assign to each non-leaf vertex [v,j] of Tf two decision 
variables (p v ,d v ) with p v £ N([v,j]), and d v 6 {1, . . . , d max }. We call such an assignment valid if the 
following constraints are satisfied: 

(a) If for two neighbors [u,j], [v, k] in Tf, p v = [u,j] then d u = d v + 1. 

(b) For any vertex [u,j] in T\ whose label is the same as the root in G (i.e. j = 1), then d u = 0. 

Now for any valid assignment, the subtree T = {([v, j],p v )} is called an oriented spanning tree of the 
computation tree. Figure [1] shows a graph with one of its computation trees, and an oriented spanning 
tree on it. Denote the minimum weight oriented spanning tree (MWOST) of the computation tree Tf by 
T*(Tf). Similar argument as in [TT] shows that iterations of Eqs0][5]can be seen as a dynamic programming 
procedure that finds the MWOST over the computation tree. And Lemma [TJ that comes next without proof 
is analogues to the Corollary 1 from [TT]. 

Lemma 1 The BP algorithm that is initialized with zero messages, solves the MWOST problem on the 
computation tree. In particular, for each vertex i of G the decision variables (pi(t), di(t)) are exactly equal 
to the decision variables (pi,di) corresponding to the vertex [root,i] in T*(T*). 

Note 3. Lemma [JJ can be generalized to any unbalanced computation tree (a tree that is obtained from Tf 
by removing a subset of vertices and all of their descendants) as well. For an unbalanced tree, there is a 
unique set of BP initial conditions that should be used instead of the zero messages. Lemma [TJ holds for 
any model where BP is used and does not depend to the problems studied in this paper (See [T3], [TT], and 
[15] for more details). 
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Note 4. We would like to point out that the main result holds for the BP algorithm with any initial 
condition and we assume zero initial condition just to simplify the calculations. For arbitrary initial condition, 
the BP algorithm runs over a slightly modified computation tree. The new computation tree is almost the 
same computation tree as T/, except that the leaf edges of the tree have arbitrary weights and not Wy's 
from G. 

4.2 Proof of the main result 

Proof consists of two parts. First we will show that in case of convergence, the estimated MST T(t) is a 
spanning tree. Next we will prove that this limit is in fact the minimum spanning tree. 

4.2.1 Limit is a spanning tree 

First we will show that the limit of the BP algorithm is a spanning tree. 

Lemma 2 If the BP algorithm converges to {(pi, di)} i; then the set of edges {(i,pi)}i is a spanning tree of 
G. 

Proof Let us denote the set of edges {(i,Pi)}i by T. Note that from Lemma[T](and the note after that) we 
obtain the following. Since BP algorithm converges to {(pi, dj)}j, then for any vertex i and any radius r,; one 
can find a large enough computation tree with root [root, i] such that in the MWOST of that computation 
tree, all of the vertices within distance r.i of the root have decision variables that are dictated by {(i,Pi)}i- 
In other words there exist a number iVj such that in the MWOST of the computation tree T^, any vertex 
[u,j] with distance less than from [root,i] has d u = dj and p u = [*,Pj] G N([u,j]). 

Now consider the MWOST T*(T t N '). It consists of many connected pieces. Let Aj be the connected 
component of T*(T i iVi ) that contains the [root, i]. Note that each edge of A corresponds to some ([u,j],p u ) 
by definition. We list and prove a few properties about the subtree A: 

(i) A has bounded radius. All vertices of A are within distance at most 2d max from [root, i}. 
Proof. Consider the unique path P([u, j], [root, i] ) in Aj that connect a vertex [u, j] to the [root, i] . The 
depth variable along the path P([u,j], [root,i]) either always increases by one (thus |P([it,j], [root, z])| < 
^max) or it always decreases by 1 till it reaches zero and then increases by 1 up to d max (or \P([u,j], [root, i] ) | < 

2^max) ■ 

(ii) Aj has no duplicate vertex. No two vertices of A have the same labels from the set V. That means 
no two vertices of the form [u,j] and [v,j] belong to A- 

Proof. Assume the contrary, then let [u,j] and [v,j] be two such vertices in Aj which have the smallest 
depth variables d u = d v (note that by property (i) both [u,j] and [v,j] are within distance 2<i max of 
[root, i] which shows d u = d v = dj). First assume j 1. Consider the vertices of the computation tree 
that are pointed to by [u,j] and [v,j] (i.e. p u = [u',pj] and p v = [v',pj]). By design both [u',pj] and 
[v',Pj] belong to Aj since they are connected to [u,j] and [v,j] respectively, and d u > = d u — 1 = dj — 1, 
d v i = d v — 1 = dj — 1. Hence we should have [u',pj] = (by definition of [u,j] and [v,j] that have 

smallest value for d u — d v ). But this means the vertex [u',pj] of the computation tree has two distinct 
neighbors [u,j] and [v,j] with the same label which is a contradiction because the computation tree 
and G have the same local structure at any non-leaf vertex. The case j = 1 is trivially impossible since 
the depth variable along the path between [u,j] and [v, j] should go from zero to zero. 

(iii) Aj has all labels from V. First note that Aj has a vertex with label 1. Because starting form i and 
following the pointers the depth variable is decreasing and it becomes zero at some point. That vertex 
which has depth zero is in Aj and has to have label 1. Now we show that for any j G V there exist a 
vertex [it, j] G Aj. Consider the sequence S = j,Pj,P Pj ,P Pp . , ■ ■ ■■ This sequence has to stop at 1 since 
the depth variable for elements of the sequence is strictly decreasing. So it eventually intersects labels 
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that appear in Ai. Consider the first time that the intersection happens (for an element [u, k) of Ai we 
have k 6 S). If £ is the element before k in S (i.e. pi = k). We prove that £ is also a label in Ai. This 
is because [u, k] has the same local structure in the computation tree as k in G and £ is a neighbor 
of k in G. Thus there exist a [v,£] € N([u,k]) and the distance between [v,£] and [root,i] is at most 
2e?max + 1. So p„ = p£ and d„ = df. This means that [v,£] is connected to [u,k] and hence is in Ai. 
Repeating the process, we obtain that j is a label in Ai. 

Properties (i)-(iii) show that under [u,j] — > j the tree Ai is isomorphic to T = {(pi, di)}i and therefore T is 
a spanning tree of G. I 

4.2.2 Limit is the minimum weight spanning tree 

To prove that the set T = {{pi,ck)}i is the minimum spanning tree we assume the contrary (T =/= T*). 
Then we will construct an oriented spanning tree T(T i /Vi ) that has less weight than T*(T/ Vi ) which is a 
contradiction. 

For our proof, we need to give a quick review of Prim's well-known algorithm [16] for finding the minimum 
spanning tree of the graph G. The algorithm continuously increases the size of a tree starting with a single 
vertex until it spans all the vertices. It starts from an initial subtree To of G that contains a single vertex. 
Then for any r = 0, 1, ...,» — 2 the following step is repeated: Find the minimum weight edge (it, v) that 
connects T r to G\T r and set T r+ \ = T r U {[u, v)}. The tree T„_i is the minimum spanning tree. 

Assume that the Prim algorithm starts with the vertex 1. Let e l7 e 2 , . . . , e n _ 1 be the order of the edges that 
are added during the algorithm. That is T r = {e 1: e 2 , ■ ■ . , e r _i}. Now let efc be the first edge that does not 
belong to T. The subgraph T U {e^} has a cycle. Thus it has has an edge e in T that connects Tj~-i to 
outside of Tfc_i. By Prim's algorithm, w(e) < w{ek). The inequality is strict since T* is unique. 

Let T' = (T U {efc})\{e}. It is not hard to see that T' is also a spanning tree of G and w(T') < w(T). 
Consider the pointer-depth representation for the tree T" and denote the corresponding decision variables 
by {(p'i,di)}i. Let also (x,p' x ) corresponds to the edge e in this new pointer-depth representation. Since 
leTtCTnf then for any i £ T k we have (pt, d t ) = {p\, dQ. 

Now we consider the oriented spanning tree T(T i /Vi ). Similar to the previous section, let Ai be the connected 
component of T*(T i Wi ) that contains the [root,i\. Let [u,x] <E Ai be the unique vertex that has label x. We 
will change the decision variables of any vertex [v,j] of Ai from (p v ,dj) to (jp' v ,dj) where p' v is the unique 
vertex in N{[v 7 j\) that has label p'y Denote the new subgraph of the computation tree by T'(^ *). Clearly 
w(T' '(Tj *)) < w(T(T i JVi )). Now wc only need to show that T'iT^ 1 ) is an oriented spanning tree of the 
computation tree to achieve a contradiction. 

Since T'\T = {e}, therefore we only need to check that local constraints at edge ([u, x],p' u ) of T'iT^*) satisfy 
the ones of an oriented spanning tree. Note that all neighbors of the vertex [u, x] are within the distance 
2cJ ma x + 1 of [root,i]. Thus if, p' u = [v,p' x ] then (p v ,d v ) will be equal to {[*,p p > m }, d' , ). On the other hand 
p' x is a vertex in Tk and for all vertices of the decision variables (p,d) and {p',d') are the same. Thus 
(p' u ,d' u ), (p' v ,d' v ) will satisfy the local constraints since (p' x ,d' x ), (j)p> ,d p > ) satisfy the same constraint in T". 
Therefore we obtained a new oriented spanning tree of the computation tree which has weight less than the 
optimum, T*(T/ Vl ), which is a contradiction. So the assumption T ^ T* was incorrect. I 
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